Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transform_file task uses ack_late=True #881

Merged
merged 4 commits into from
Oct 15, 2024
Merged

transform_file task uses ack_late=True #881

merged 4 commits into from
Oct 15, 2024

Conversation

BenGalewsky
Copy link
Contributor

We found that when the autoscaler decides to scale down the number of transformers that the science image dies immediatly which causes the "sync" call to fail. When this happens we need to put the transform request back onto rabbit queue and die gracefully.

Add the acks_late propoerty to the transform_file request and now the failure to receive the sync command from the science container is a fatal error.

We found that when the autoscaler decides to scale down the number of transformers
that the science image dies immediatly which  causes the "sync" call to fail. When this
happens we need to put the transform request back onto rabbit queue and die
gracefully.

Add the acks_late propoerty to the transform_file request and now the failure
to receive the sync command from the science container is a fatal error.
1. Add a no-op SIGTERM handler to the object store uploader so it doesn't terminate
when the pod shuts down. It stays up and relies on the main Celery worker to send the poison
pill to the queue when it is time to go.

2. Set the celery pre-fetch multiplier to one so we don't get greedy and grab more
transform requests than we can digest
@BenGalewsky BenGalewsky merged commit 88757a8 into develop Oct 15, 2024
75 checks passed
@BenGalewsky BenGalewsky deleted the task_ack branch October 15, 2024 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants